A rough set approach to outlier detection

نویسندگان

  • Feng Jiang
  • Yuefei Sui
  • Cungen Cao
چکیده

This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, redistribution , reselling , loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material. " One person's noise is another person's signal " (Knorr and Ng 1998). In recent years, much attention has been given to the problem of outlier detection, whose aim is to detect outliers— objects who behave in an unexpected way or have abnormal properties. Detecting such outliers is important for many applications such as criminal activities in electronic commerce, computer intrusion attacks, terrorist threats, agricultural pest infestations. In this paper, we suggest to exploit the framework of rough sets for detecting outliers. We propose a novel definition of outliers—RMF (rough membership function)-based outliers, by virtue of the notion of rough membership function in rough set theory. An algorithm to find such outliers is also given. And the effectiveness of RMF-based method is demonstrated on two publicly available data sets. 1. Introduction Knowledge discovery in databases (KDD), or data mining, is an important issue in the development of data-and knowledge-base systems. Usually, knowledge discovery tasks can be classified into four general categories: (a) dependency detection, (b) class identification, (c) class description, and (d) outlier/exception detection (Knorr and Ng 1998). In contrast to most KDD tasks, such as clustering and classification, outlier detection aims to find small groups of data objects that are exceptional when compared with the rest large amount of data, in terms of certain sets of properties. For many applications, such as fraud detection in E-commerce, it is more interesting to find the rare events than to find the common ones. Studying the extraordinary behaviours of outliers can help us uncover the valuable information hidden behind them. Recently researchers have begun focusing on outlier detection and attempted to design algorithms for tasks such as fraud detection (Bolton and Hand 2002), identification of computer network …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Approach for Outlier Detection using Rough Entropy

Outlier detection is an important task in data mining and its applications. It is defined as a data point which is very much different from the rest of the data based on some measures. Such a data often contains useful information on abnormal behavior of the system described by patterns. In this paper, a novel method for outlier detection is proposed among inconsistent dataset. This method expl...

متن کامل

A Rough Set Approach to Spatio-temporal Outlier Detection

Detecting outliers which are grossly different from or inconsistent with the remaining spatio-temporal dataset is a major challenge in real-world knowledge discovery and data mining applications. In this paper, we deal with the outlier detection problem in spatio-temporal data and we describe a rough set approach that finds the top outliers in an unlabeled spatio-temporal dataset. The proposed ...

متن کامل

Some issues about outlier detection in rough set theory

‘‘One person’s noise is another person’s signal” (Knorr, E., Ng, R. (1998). Algorithms for mining distancebased outliers in large datasets. In Proceedings of the 24th VLDB conference, New York (pp. 392–403)). In recent years, much attention has been given to the problem of outlier detection, whose aim is to detect outliers – objects which behave in an unexpected way or have abnormal properties....

متن کامل

Finding Anomaly With Fuzzy C-means ANN Using Semi-Supervised Approach

The FC-ANN (Artificial Neural Network) is used to speed up the technique. The anomaly Outlier detection is primary in various data-mining applications. Outlier detection methods have been suggested for number of application such as, fraud detection, voting irregularity analysis, data cleansing, clinical trials, network intrusion, severe weather prediction, geographic information system, credit ...

متن کامل

Outlier Detection in Wireless Sensor Networks Using Distributed Principal Component Analysis

Detecting anomalies is an important challenge for intrusion detection and fault diagnosis in wireless sensor networks (WSNs). To address the problem of outlier detection in wireless sensor networks, in this paper we present a PCA-based centralized approach and a DPCA-based distributed energy-efficient approach for detecting outliers in sensed data in a WSN. The outliers in sensed data can be ca...

متن کامل

Rough K-means Outlier Factor Based on Entropy Computation

Many studies of outlier detection have been developed based on the cluster-based outlier detection approach, since it does not need any prior knowledge of the dataset. However, the previous studies only regard the outlier factor computation with respect to a single point or a small cluster, which reflects its deviates from a common cluster. Furthermore, all objects within outlier cluster are as...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. J. General Systems

دوره 37  شماره 

صفحات  -

تاریخ انتشار 2008